reversible markov chain
P(x,y)dy=1 x K. Wedefinethefollowingstates,whichcapturekeypropertiesofthequantumwalk: |ψxi: = Z
In this section, we first define the quantum walk operators and introduce some spectral properties. Then,theeigenvaluesofW are n 1,λj q 1 λ2ji o . Let {(λi,fi)} be the set of eigenvalues and eigenfunctions of P, and |ψii be the eigenvectors of the corresponding quantum walk operator W. Let ρ0 be a probability density that is a warm start for ρ and mixes up to TV-distancein t steps of M. Furthermore, assume that kρ/ρ0k = R Theorem 5 (Quantum walk implementation cost). Let M0,M1 be two ergodic reversible Markov chains with stationary distributions π0,π1, respectively. Suppose π0 is β0-warm with respect to M1 and mixes up to total variation distancein t0() steps.
Convergence of off-policy TD(0) with linear function approximation for reversible Markov chains
Overmars, Maik, Goseling, Jasper, Boucherie, Richard
We study the convergence of off-policy TD(0) with linear function approximation when used to approximate the expected discounted reward in a Markov chain. It is well known that the combination of off-policy learning and function approximation can lead to divergence of the algorithm. Existing results for this setting modify the algorithm, for instance by reweighing the updates using importance sampling. This establishes convergence at the expense of additional complexity. In contrast, our approach is to analyse the standard algorithm, but to restrict our attention to the class of reversible Markov chains. We demonstrate convergence under this mild reversibility condition on the structure of the chain, which in many applications can be assumed using domain knowledge. In particular, we establish a convergence guarantee under an upper bound on the discount factor in terms of the difference between the on-policy and off-policy process. This improves upon known results in the literature that state that convergence holds for a sufficiently small discount factor by establishing an explicit bound. Convergence is with probability one and achieves projected Bellman error equal to zero. To obtain these results, we adapt the stochastic approximation framework that was used by Tsitsiklis and Van Roy [1997 for the on-policy case, to the off-policy case. We illustrate our results using different types of reversible Markov chains, such as one-dimensional random walks and random walks on a weighted graph.
Mixing Time Estimation in Reversible Markov Chains from a Single Sample Path
Daniel J. Hsu, Aryeh Kontorovich, Csaba Szepesvari
The interval is computed from a single finite-length sample path from the Markov chain, and does not require the knowledge of any parameters of the chain. This stands in contrast to previous approaches, which either only provide point estimates, or require a reset mechanism, or additional prior knowledge.
Mixing Time Estimation in Reversible Markov Chains from a Single Sample Path
This article provides the first procedure for computing a fully data-dependent interval that traps the mixing time t_{mix} of a finite reversible ergodic Markov chain at a prescribed confidence level. The interval is computed from a single finite-length sample path from the Markov chain, and does not require the knowledge of any parameters of the chain. This stands in contrast to previous approaches, which either only provide point estimates, or require a reset mechanism, or additional prior knowledge. The interval is constructed around the relaxation time t_{relax}, which is strongly related to the mixing time, and the width of the interval converges to zero roughly at a \sqrt{n} rate, where n is the length of the sample path. Upper and lower bounds are given on the number of samples required to achieve constant-factor multiplicative accuracy.
Discrete distributions are learnable from metastable samples
Jayakumar, Abhijith, Lokhov, Andrey Y., Misra, Sidhant, Vuffray, Marc
Physically motivated stochastic dynamics are often used to sample from high-dimensional distributions. However such dynamics often get stuck in specific regions of their state space and mix very slowly to the desired stationary state. This causes such systems to approximately sample from a metastable distribution which is usually quite different from the desired, stationary distribution of the dynamic. We rigorously show that, in the case of multi-variable discrete distributions, the true model describing the stationary distribution can be recovered from samples produced from a metastable distribution under minimal assumptions about the system. This follows from a fundamental observation that the single-variable conditionals of metastable distributions that satisfy a strong metastability condition are on average close to those of the stationary distribution. This holds even when the metastable distribution differs considerably from the true model in terms of global metrics like Kullback-Leibler divergence or total variation distance. This property allows us to learn the true model using a conditional likelihood based estimator, even when the samples come from a metastable distribution concentrated in a small region of the state space. Explicit examples of such metastable states can be constructed from regions that effectively bottleneck the probability flow and cause poor mixing of the Markov chain. For specific cases of binary pairwise undirected graphical models (i.e. Ising models), we extend our results to further rigorously show that data coming from metastable states can be used to learn the parameters of the energy function and recover the structure of the model.
Mixing Time Estimation in Reversible Markov Chains from a Single Sample Path
The interval is computed from a single finite-length sample path from the Markov chain, and does not require the knowledge of any parameters of the chain. This stands in contrast to previous approaches, which either only provide point estimates, or require a reset mechanism, or additional prior knowledge.
Variable-Complexity Weighted-Tempered Gibbs Samplers for Bayesian Variable Selection
Subset weighted-Tempered Gibbs Sampler (wTGS) has been recently introduced by Jankowiak to reduce the computation complexity per MCMC iteration in high-dimensional applications where the exact calculation of the posterior inclusion probabilities (PIP) is not essential. However, the Rao-Backwellized estimator associated with this sampler has a high variance as the ratio between the signal dimension and the number of conditional PIP estimations is large. In this paper, we design a new subset weighted-Tempered Gibbs Sampler (wTGS) where the expected number of computations of conditional PIPs per MCMC iteration can be much smaller than the signal dimension. Different from the subset wTGS and wTGS, our sampler has a variable complexity per MCMC iteration. We provide an upper bound on the variance of an associated Rao-Blackwellized estimator for this sampler at a finite number of iterations, $T$, and show that the variance is $O\big(\big(\frac{P}{S}\big)^2 \frac{\log T}{T}\big)$ for a given dataset where $S$ is the expected number of conditional PIP computations per MCMC iteration. Experiments show that our Rao-Blackwellized estimator can have a smaller variance than its counterpart associated with the subset wTGS.
A Geometric Reduction Approach for Identity Testing of Reversible Markov Chains
Wolfer, Geoffrey, Watanabe, Shun
We consider the problem of testing the identity of a reversible Markov chain against a reference from a single trajectory of observations. Employing the recently introduced notion of a lumping-congruent Markov embedding, we show that, at least in a mildly restricted setting, testing identity to a reversible chain reduces to testing to a symmetric chain over a larger state space and recover state-of-the-art sample complexity for the problem.